Corpus and dictionary development for classifiers/quantifiers towards a French-Japanese machine translation

نویسندگان

  • Mutsuko Tomokiyo
  • Christian Boitet
چکیده

Although quantifiers/classifiers expressions occur frequently in everyday communications or written documents, there is no description for them in classical bilingual paper dictionaries, nor in machine-readable dictionaries. The paper describes a corpus and dictionary development for quantifiers/classifiers, and their usage in the framework of French-Japanese machine translation (MT). They often cause problems of lexical ambiguity and of set phrase recognition during analysis, in particular for a long-distance language pair like French and Japanese. For the development of a dictionary aiming at ambiguity resolution for expressions including quantifiers and classifiers which may be ambiguous with common nouns, we have annotated our corpus with UWs (interlingual lexemes) of UNL (Universal Networking Language) found on the UNL-jp dictionary. The extraction of potential classifiers/quantifiers from corpus is made by UNLexplorer web service.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rule-based Translation of Quantifiers for Chinese-Japanese Machine Translation

Quantifiers and numerals often cause mistakes in Chinese-Japanese machine translation. In this paper, an approach to quantifier translation is proposed based on the syntactic features after classification. First, morphological analysis is performed on sentences extracted from a Chinese-Japanese aligned corpus, which consists of quantifiers and numerals. Next, statistical information is obtained...

متن کامل

Sub-Sentential Alignment Method by Analogy

This paper describes a method for searching word correspondences between pairs of translation sentences. In the Example-Based Machine Translation, translation patterns can be extracted easily if word correspondences between pair of translation sentences are defined. The popular methods for aligning bilingual corpus at a sub-sentential level are unable to produce reliable result when the size of...

متن کامل

Non-Compositional Language Model and Pattern Dictionary Development for Japanese Compound and Complex Sentences

To realize high quality machine translation, we proposed a Non-Compositional Language Model, and developed a sentence pattern dictionary of 226,800 pattern pairs for Japanese compound and complex sentences consisting of 2 or 3 clauses. In pattern generation from a parallel corpus, Compositional Constituents that could be generalized were 74% of independent words, 24% of phrases and only 15% of ...

متن کامل

Translation By Machine Of Complex Nominals: Getting It Right

We present a method for compositionally translating noun-noun (NN) compounds, using a word-level bilingual dictionary and syntactic templates for candidate generation, and corpus and dictionary statistics for selection. We propose a support vector learning-based method employing target language corpus and bilingual dictionary data, and evaluate it over a English Japanese machine translation tas...

متن کامل

Multi-lingual Sentence Generation from the PIVOT Interlingua

This paper proposes a strategy for French and Spanish sentence generation systems, based on the English generation system. The English generation mode! consists of four procedures, conceptual wording (sentence-structure planning), syntactic selection, ordering and morphological generation. The analysis of linguistic similarities and differences between English, French and Spanish reveals that a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016